highcharter To start you can read a bit about a fun relationship network to visualize: Game of Thrones!
| Objective | Complete |
|---|---|
| Summarize the concepts of distance matrix and network visualization | |
| Create a distance matrix for a given dataset | |
| Create nodes and edges dataframes | |
| Build and customize a Network HTMLwidget |
The htmlwidgets package provides a framework for easily creating R bindings to JavaScript libraries
HTML widgets are useful because they can be:
Some popular packages based on htmlwidgets are leaflet for maps, dygraphs for time series, and rthreejs for interactive 3D graphics
In this module we will use visNetwork to create an HTML widget for network visualization
Networks are a collection of connected objects:
distance between two nodes to describe how connected they are N * N where each value corresponds to the distance between a pair of nodesProperties of the distance matrix:
Thus we can capture all the required information only using the lower triangle of the matrix
htmlwidgetsid columnfrom and to columnslibrary(visNetwork)
?visNetwork
| Objective | Complete |
|---|---|
| Summarize the concepts of distance matrix and network visualization | ✔ |
| Create a distance matrix for a given dataset | |
| Create nodes and edges dataframes | |
| Build and customize a Network HTMLwidget |
main_dir be the variable corresponding to your skillsoft folder on your Desktop# Set `main_dir` to the location of your `skillsoft` folder (for Mac/Linux).
main_dir = "~/Desktop/skillsoft"
# Set `main_dir` to the location of your `skillsoft` folder (for Windows).
main_dir = "C:/Users/[username]/Desktop/skillsoft"
# Make `data_dir` from the `main_dir` and
# remainder of the path to data directory.
data_dir = paste0(main_dir, "/data")
# Do the same for your 'plot_dir' which is where your interactive plots will be stored.
plot_dir = paste0(main_dir, "/plots")
library(htmlwidgets)
library(tidyverse)
library(broom)
library(dplyr)
library(visNetwork)
We will use two datasets today:
The data dictionaries can be found though the links above.
The given dataset contains variables about:
The Target variable has four categories:
# Read in the Costa Rican dataset.
setwd(data_dir)
costa = read.csv("costa_rica_poverty.csv",header = TRUE)
# View the dimensions of the dataset.
dim(costa)
[1] 9557 84
# View first few rows and columns
costa[1:5,1:10]
household_id ind_id rooms tablet males_under_12 males_over_12 males_tot
1 21eb7fcc1 ID_279628684 3 0 0 1 1
2 0e5d7a658 ID_f29eb3ddd 4 1 0 1 1
3 2c7317ea8 ID_68de51c94 8 0 0 0 0
4 2b58d945f ID_d671db89c 5 1 0 2 2
5 2b58d945f ID_d56d6f5f5 5 1 0 2 2
females_under_12 females_over_12 females_tot
1 0 0 0
2 0 0 0
3 0 1 1
4 1 1 2
5 1 1 2
# Subset one region.
costa_subset_target = subset(costa, region_central == 1)
# Subset a few columns.
costa_small = costa_subset_target %>%
select(rooms,ppl_total,
monthly_rent,Target)
# View the first few rows of the dataset.
head(costa_small)
rooms ppl_total monthly_rent Target
1 3 1 190000 4
2 4 1 135000 4
3 8 1 NA 4
4 5 4 180000 4
5 5 4 180000 4
6 5 4 180000 4
# Remove rows with NA.
costa_small = na.omit(costa_small)
# We keep only the unique rows since duplicate rows would have a distance of 0.
costa_small= unique(costa_small)
head(costa_small)
rooms ppl_total monthly_rent Target
1 3 1 190000 4
2 4 1 135000 4
4 5 4 180000 4
8 2 4 130000 4
12 3 2 100000 4
16 2 4 90000 4
dist functiondist takes ?dist
# Create distance matrix.
costa_distance = dist(costa_small)
# `dist` returns the lower triangle of the distance matrix as a vector.
head(costa_distance)
[1] 55000 10000 60000 90000 100000 25000
# Normalize the distances to values between 0 and 1.
costa_distance = costa_distance/max(costa_distance)
head(costa_distance)
[1] 0.023369678 0.004249033 0.025494194 0.038241292 0.042490324 0.010622581
# Use 1- distance to obtain the value for similarity.
costa_sim = 1-costa_distance
head(costa_sim)
[1] 0.9766303 0.9957510 0.9745058 0.9617587 0.9575097 0.9893774
| Objective | Complete |
|---|---|
| Summarize the concepts of distance matrix and network visualization | ✔ |
| Create a distance matrix for a given dataset | ✔ |
| Create nodes and edges dataframes | |
| Build and customize a Network HTMLwidget |
The edge dataframe tells visNetwork:
from which node to which node to draw an edgevalue (the thickness) of the edgeWe need to transform our similarity matrix into an edge dataframe
We can do this using the tidy function from the broom package, which turns the messy output of built-in R functions into tidy dataframes
# Create edge dataframe.
costa_edges = tidy(costa_sim)
# Edges dataframe has to be named this way for visNetwork input.
colnames(costa_edges) = c("from", "to", "value")
head(costa_edges)
# A tibble: 6 x 3
from to value
<fct> <fct> <dbl>
1 1 2 0.977
2 1 4 0.996
3 1 8 0.975
4 1 12 0.962
5 1 16 0.958
6 1 20 0.989
# We choose the median as the threshold since this gives us the best visualization.
costa_edges = subset(costa_edges,value>median(costa_edges$value))
# Arrange by order of edge thickness.
costa_edges = arrange(costa_edges, desc(value))
# Subset only top 200.
costa_edges = costa_edges[1:200,]
visNetwork must have an id columnfrom and to columns of the edges dataframe# Get unique nodes from edges dataframe and combine them
costa_nodes_from = data.frame(id = unique(costa_edges$from))
costa_nodes_to = data.frame(id = unique(costa_edges$to))
costa_nodes = rbind(costa_nodes_from,costa_nodes_to)
# Retain unique nodes in case nodes are repeated in `from` and `to` columns
costa_nodes = unique(costa_nodes)
# Add color to the nodes dataframe based on Target value from original dataframe
costa_small = select(costa_small, Target) #<- we only need the target info
costa_small$id = rownames(costa_small)
# Merge nodes dataframe with the dataframe with Target value
costa_nodes = merge(costa_nodes, costa_small,
by = "id", all.x = TRUE) #<- merge() needs the `id` column to
# join the two dataframes
# Assign color to nodes based on the Target value
costa_nodes$color = factor(costa_nodes$Target, #<- create a factor
labels = c("orange", "darkblue", "maroon", "seagreen"), #<- assign color
levels = c(1, 2, 3, 4)) # based on Target value
# We do not need the Target column anymore.
costa_nodes = select(costa_nodes, c(id, color))
head(costa_nodes)
id color
1 1004 seagreen
2 1049 seagreen
3 1061 maroon
4 1083 seagreen
5 1095 darkblue
6 1101 seagreen
| Objective | Complete |
|---|---|
| Summarize the concepts of distance matrix and network visualization | ✔ |
| Create a distance matrix for a given dataset | ✔ |
| Create nodes and edges dataframes | ✔ |
| Build and customize a Network HTMLwidget |
nodes and edges dataframes# Create network.
costa_network = visNetwork(costa_nodes, #<- set nodes
costa_edges) #<- set edges
costa_network
Zoom in to see the individual nodes.
Remember that the green nodes are the non-vulnerable households, while the remaining nodes are vulnerable households.
Polling questions:
We notice that there are:
The nodes for vulnerable households do not appear very connected to the nodes for non-vulnerable households
This could tell us that these households are not very similar
We can add extra functionality using visOptions such as:
The entire list of visOptions can be found here
Click on individual nodes to highlight them and select IDs from the dropdown
# Add network visualizations
costa_network = visNetwork(costa_nodes, #<- set nodes
costa_edges) %>% #<- set edges
visOptions(highlightNearest = TRUE, #<- highlight nearest nodes
# when clicking on a node
nodesIdSelection = TRUE) #<- create a dropdown menu to
# select particular nodes
costa_network
# Set working directory to where you save interactive plots.
setwd(plot_dir)
# Load the library.
library(htmlwidgets)
# Save desired interactive plot to an HTML file.
saveWidget(costa_network, #<- plot object to save
"network.html", #<- name of file to where the plot is to be saved
selfcontained = TRUE) #<- set `selfcontained` to TRUE, so that
# all necessary files and scripts are embedded
# into the HTML file itself
| Objective | Complete |
|---|---|
| Summarize the concepts of distance matrix and network visualization | ✔ |
| Create a distance matrix for a given dataset | ✔ |
| Create nodes and edges dataframes | ✔ |
| Build and customize a Network HTMLwidget | ✔ |
visNetworkleaflet for maps and dygraphs for time seriesvisNetwork)